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Abstract. It is well-known that every first-order property on words is expressible using at 
most three variables. The subclass of properties expressible with only two variables is also 
quite interesting and well-studied. We prove precise structure theorems that characterize 
the exact expressive power of first-order logic with two variables on words. Our results 
apply to both the case with and without a successor relation. 

For both languages, our structure theorems show exactly what is expressible using a 
given quantifier depth, n, and using m blocks of alternating quantifiers, for any m < n. 
Using these characterizations, we prove, among other results, that there is a strict hierarchy 
of alternating quantifiers for both languages. The question whether there was such a 
hierarchy had been completely open. As another consequence of our structural results, 
we show that satisfiability for first-order logic with two variables without successor, which 
is NEXP-complete in general, becomes NP-complete once we only consider alphabets of a 
bounded size. 



1. Introduction 

It is well-known that every first-order property on words is expressible using at most 
three variables 018]. The subclass of properties expressible with only two variables is also 
quite interesting and well-studied (Fact II. ip . 

In this paper we prove precise structure theorems that characterize the exact expressive 
power of first-order logic with two variables on words. Our results apply to F0^[<] and 
FO^[<,Suc], the latter of which includes the binary successor relation in addition to the 
linear ordering on string positions. 

For both languages, our structure theorems show exactly what is expressible using 
a given quantifier depth, n, and using m blocks of alternating quantifiers, for any m < 
n. Using these characterizations, we prove that there is a strict hierarchy of alternating 
quantifiers for both languages. The question whether there was such a hierarchy had been 
completely open since it was asked in pi [5. As another consequence of our structural results, 
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we show that satisfiabihty for F0^[<], which is NEXP-complete in general becomes NP- 
complete once we only consider alphabets of a bounded size. 

Our motivation for studying FO^ on words comes from the desire to understand the 
trade-off between formula size and number of variables. This is of great interest because, 
as is well-known, this is equivalent to the trade-off between parallel time and number of 
processors |6j. Adler and Immerman [1] introduced a game that can be used to determine 
the minimum size of first-order formulas with a given number of variables needed to express 
a given property. These games, which are closely related to the communication complexity 
games of Karchmer and Wigderson [9], were used to prove two optimal size bounds for 
temporal logics [1]. Later Grohe and Schweikardt used similar methods to study the size 
versus variable trade-off for first-order logic on unary words [5]. They proved that all first- 
order expressible properties of unary words are already expressible with two variables and 
that the variable-size trade-off between two versus three variables is polynomial whereas the 
trade-off between three versus four variables is exponential. They left open the trade-off 
between k and k + 1 variables for k > A. While we do not directly address that question 
here, our classification of FO^ on words is a step towards the general understanding of the 
expressive power of FO needed for progress on such trade-offs. 

Our characterization of F0^[<] and F0^[<, Sue] on words is based on the very natural 
notion of n-ranker (Definition 13. 2p . Informally, a ranker is the position of a certain combi- 
nation of letters in a word. For example, i>a and <ib are 1-rankers where i>a('U^) is the position 
of the first a in (from the left) and <ih{w) is the position of the first b in tt) from the right. 
Similarly, the 2-ranker r2 = i>a'>c denotes the position of the first c to the right of the first 
a, and the 3-ranker, = i>a Ob denotes the position of the first b to the left of r2. If 
there is no such letter then the ranker is undefined. For example, r3(cababcba) = 5 and 
r3(acbbca) is undefined. 

Our first structure theorem (Theorem 13. Sp says that the properties expressible in 
F0^[<], i.e. first-order logic with two variables and quantifier depth n, are exactly boolean 
combinations of statements of the form, "r is defined" , and "r is to the left (right) of r'" 
for fc-rankers, r, and fc'-rankers, r', with k < n and k' < n. A non-quantitative version of 
this theorem was previously known [13] Q Furthermore, a quantitative version in terms of 
iterated block products of the variety of semi-lattices is presented in [16] , based on work by 
Straubing and Therien [14]. 

Surprisingly, Theorem 13.81 can be generalized in almost exactly the same form to char- 
acterize FO^ „[<] where there are at most m blocks of alternating quantifiers, m < n. This 
second structure theorem (Theorem 14. 5p uses the notion of (m, n)-ranker where there are m 
blocks of i>'s or o's, that is, changing direction in rankers corresponds exactly to alternation 
of quantifiers. Using Theorem 14.51 we prove that there is a strict alternation hierarchy for 
F0^[<] (Theorem 14. lip but that exactly at most ]S| + 1 alternations are useful, where ]S] 
is the size of the alphabet (Theorem 14. 7p . 

The language FO^[<,Suc] is more expressive than F0^[<] because it allows us to talk 
about consecutive strings of symbol^. For F0^[<, Sue], a straightforward generalization of 
n-ranker to n-successor-ranker allows us to prove exact analogs of Theorems 13.81 and 14.51 
We use the latter to prove that there is also a strict alternation hierarchy for FO^[<,Suc] 

"'^See item [7] in Fact II. ll a "turtle language" is a language of the form "r is defined", for some ranker, r. 
^With three variables we can express Suc(a;, y) using the ordering: x < y A yz{z < x\/ y < z). 
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(Theorem 15 .6^ . Since in the presence of successor we can encode an arbitrary alphabet in 
binary, no analog of Theorem 14.71 holds for FO^[<,Suc]. 

The expressive power of first-order logic with three or more variables on words has been 
well-studied. The languages expressible are of course the star- free regular languages |10j. 
The dot-depth hierarchy is the natural hierarchy of these languages. This hierarchy is strict 
[2] and identical to the first-order quantifier alternation hierarchy [18\ [T9] . 

Many beautiful results on FO^ on words were also already known. The main significant 
outstanding question was whether there was an alternation hierarchy. The following is a 
summary of the main previously known characterizations of FO^ [<] on words. For a detailed 
treatment of all these characterizations, we refer the reader to p^. 

Fact 1.1. [allllllllliailZlIIS] Let C S*. The following conditions are equivalent: 

(1) R G F02[<] 

(2) R is expressible in unary temporal logic 

(3) i?G E2nn2[<] 

(4) R is an unambiguous regular language 

(5) The syntactic semi-group of i? is a member of DA 

(6) R is recognizable by a partially-ordered 2-way automaton 

(7) is a boolean combination of "turtle languages" 

The proofs of our structure theorems are self-contained applications of Ehrenfeucht- 
Frai'sse games. All of the above characterizations follow from these results. Furthermore, 
we have now exactly connected quantifier and alternation depth to the picture, thus adding 
tight bounds and further insight to the above results. 

For example, one can best understand item[4]above - that F0^[<] on words corresponds 
to the unambiguous regular languages - via Theorem 13.121 which states that any F0^[<] 
formula with one free variable that is always true of at most one position in any string, 
necessarily denotes an n-ranker. 

In the conclusion of [13] , the authors define the subclasses of rankers with one and two 
blocks of alternation. They write that, ". . . turtle languages might turn out to be a helpful 
tool for further studies in algebraic language theory." We feel that the present paper fully 
justifies that prediction. Turtle languages — aka rankers — do provide an exceptionally 
clear and precise understanding of the expressive power of FO^ on words, with and without 
successor. 

In summary, our structure theorems provide a complete classification of the expressive 
power of FO^ on words in terms of both quantifier depth and alternation. They also tighten 
several previous characterizations and lead to the alternation hierarchy results. 

We begin the remainder of this paper with a brief review of logical background includ- 
ing Ehrenfeucht-Fraisse games, our main tool. In Sect. [3] we formally define rankers and 
present our structure theorem for F0^[<]. The structure theorem for F0^ „[<] is covered 
in Sect. HI including our alternation hierarchy result that follows from it. Sect. [5] extends 
our structure theorems and the alternation hierarchy result to FO^[<,Suc]. Finally, we 
discuss applications of our structural results to satisfiability for F0^[<] in Sect.El 

2. Background and Definitions 

We recall some notation concerning strings, first-order logic, and Ehrenfeucht-Fraisse 
games. See [6] for more details, including the proof of Facts [27T] and [2^21 
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S will always denote a finite alphabet and e the empty string. For a word w G S and 
i G [1,^], let Wi be the i-th letter of w; and for a subinterval of [1,^], let w^^^^t^ be the 
substring Wi . . .Wj. Slightly abusing notation, we identify a word w T,^ with the logical 
structure w = ({1, ...,£}; Q^, a G S; x^; y^). Here Q^, a G S are all unary relation symbols, 
and X and y are the only two variables. If not specified otherwise, we have x"' = y*" = 1 by 
default, and for all a G S, Qa = {1 < ^ < ^ I i«i = a.}. Furthermore, wc write {w,i,j) for 
the word structure w with the two variables set to i and j, respectively, and {w,i) for the 
word structure w with x^ = i. Thus w = {w, 1, 1), and {w, i) \= Qa.{x) iff Wi = a. 

We use F0[<] to denote first-order logic with a binary linear order predicate <, and 
FO = FO[<,Suc] for first-order logic with an additional binary successor predicate. FO^ 
refers to the restriction of first-order logic to use at most two distinct variables, and quanti- 
fier depth n. FO^^^ is the further restriction to formulas such that any path in their parse 
tree has at most m blocks of alternating quantifiers, and FO^-ALT[m] = Un>m n- 
write u =^ v to mean that u and v agree on all formulas from FO^, and u =^^„ v if they 
agree on FOf„ „. 

We assume that the reader is familiar with our main tool: the Ehrenfeucht-Fraisse 
game. In each of the n moves of the game FO^(u, v), Samson places one of the two pebble 
pairs, X or y on a position in one of the two words and Delilah then answers by placing that 
pebble's mate on a position of the other word. Samson wins if after any move, the map 
from the chosen points in u to those in v, i.e., x" i— > x^, y" i-^ y^ is not an isomorphism 
of the induced substructures; and Delilah wins otherwise. The fundamental theorem of 
Ehrenfeucht-Fraisse games is the following: 

Fact 2.1. Let u,v e n e N. Delilah has a winning strategy for the game FO^(u, iff 

u =1 V. 

Thus, Ehrenfeucht-Fraisse games are a perfect tool for determining what is express- 
ible in first-order logic with a given quantifier-depth and number of variables. The game 
FO^ „(tt, u) is the restriction of the game FO'^{u,v) in which Samson may change which 
word he plays on at most m — 1 times. 

Fact 2.2. Let t; G S* and let m,n G N with m < n. Delilah has a winning strategy for 
the game FO^^„(u,i;) iflF u =^_„ v. 

Wc end this section with a simple lemma that will be useful whenever wc want to prove 
that there is a formula expressing a property of strings. With this lemma, it suffices to 
show that for any pair of strings, one with the property in question and one without, there 
is a formula that distinguishes between these two particular strings. 

Lemma 2.3. Let P C E* and let L be a logic closed under boolean operations with only 
finitely many inequivalent formulas. If for every u ^ P and every f G P there is a formula 
'Pu,v £ L such that u \= cpu,v and v ^ ^u,v, then there is a formula if & L such that for all 
w G T,*, w \= <^=^ w e P. 

Proof. Let F = {^u,v | u G P, f G P}, and let F' be a maximal subset of F containing 
only inequivalent formulas. Since L contains only finitely many inequivalent formulas, F' is 
finite. For every u G P, we define the finite sets of formulas T'^^ = {ip G T' \ u \= tp} . Since 
all these sets are subsets of the finite set F', there can only be finitely many of them. Thus 
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there is a finite set P' Q P such that {T'^ \ u e P} = {T'^ \ u e P'}. Now we set 

V A ^ 

We have ip G L and for every w G T,*, w G P w \= ip as required. □ 

It is well-known [6j that for any m,n E N, the logics FO^ and FO^^, both with and 
without the successor predicate, have only finitely many inequivalent formulas. Thus the 
above lemma applies to these logics. 



3. Structure Theorem for F0^[<] 

We define boundary positions that point to the first or last occurrences of a letter in 
a word, and define an n-ranker as a sequence of n boundary positions. In terms of [13j , 
boundary positions are turtle instructions and n-rankers are turtle programs of length n. 
The following three lemmas show that basic properties about the definedness and position 
of these rankers can be expressed in F0^[<], and we use these results to prove our structure 
theorem. 

Definition 3.1. A boundary position denotes the first or last occurrence of a letter in a 
given word. Boundary positions are of the form da where d € {t>, <i} and a € S. The 
interpretation of a boundary position da on a word w = wi . . . w^^^ £ S* is defined as 
follows. 

j m.m{i € [1, \w\] \ Wi = a.} if d = > 
I max{i € [1, \w\] \ Wi = a.} if d = <l 

Here we set min{} and max{} to be undefined, thus ds^^w) is undefined if a does not occur 
in w. A boundary position can also be specified with respect to a position q G [1, {wl]. 

min{i S [q' + 1, \w\] \ Wi = a.} if d = t> 
max{i G [1, g — 1] I = a} if d = < 



da{w) 



d^{w,q) 



Definition 3.2. Let n be a positive integer. An n-ranker r is a sequence of n boundary 
positions. The interpretation of an n-ranker r = (pi,... ,Pn) on a word w is defined as 
follows. 

pi{w) \ir={pi) 
r{w) := < undefined if (pi, . . . ,Pn-i){w) is undefined 

^Pn{w, (pi, . . . ,pn-i)iw)) otherwise 

Instead of writing n-rankers as a formal sequence {pi, . . . ,Pn), we often use the simpler 
notation pi . . .pn- We denote the set of all n-rankers by Rn, and the set of all n-rankers 
that are defined over a word w by Rn{w). Furthermore, we set iZ* := Uie[in]-^i 
:=Ug[i,„]^iH- 

Definition 3.3. Let r be an n-ranker. As defined above, we have r = {pi, . . . ,pn) for 
boundary positions pi. The k-prefix ranker of r for k E [1, n] is r^. := (pi, . . . ,Pk)- 
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Definition 3.4. Let i,j € N. The order type of i and j is defined as 

' < if i < j 

ord(i, j) 



> 



ifi=j 
iii> j 



Lemma 3.5 (distinguishing points on opposite sides of a ranker). Let n be a positive integer, 
let u,v ^ T,* and let r G Rn{u) H Rn{v). Samson wins the game FOn{u,v) where initially 
ord{x^ ,r{u)) ^ ord{x'" ,r{v)). 

Proof. We only look at the case where > r(u) and x'" < r(v) since all other cases are 
symmetric to this one. For n = 1 Samson has a winning strategy: If r is the first occurrence 
of a letter, then Samson places y on r{u) and Delilah cannot reply. If r marks the last 
occurrence of a letter in the whole word, then Samson places y on r{v). Again, Delilah 
cannot reply with any position and thus loses. 

For n > 1, we look at the prefix ranker r„_i 
of r. One of the following two cases applies. 
(1) r„_i(u) < r{u), as shown in Fig.[TJ Sam- 
son places pebble y on r(u), and Delilah has 
to reply with a position that is to the left of 
x"" . She cannot choose a position in the in- 
terval (r„_i(f ), r(t;)), because this section 



does not contain the letter Thus she 

has to choose a position left of or equal to 
r„_i(t>). By induction Samson wins the remaining game. 

(2) ^{u) < r„_i(n), as shown in Fig. [2l 
Samson places y on r(f), and Delilah has 
to reply with a position to the right of x" 
and thus to the right of r{u). She cannot 
choose any position in (r(w), r„_i(ti)), be- 
cause this interval does not contain the let- 
ter Vr(^y^, thus Delilah has to choose a po- 
sition to the right of or equal to rn-i{u). 
By induction Samson wins the remaining 
game. □ 
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Figure 1: The case rn-i{u) < r{u) 
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Lemma 3.6 (expressing the definedness of a ranker). Let n be a positive integer, and let 
r € Rn- There is a formula ipr € F0^[<] such that for all w G E*, w \= ipr r S Rn{w). 

Proof. Using Lemma 12.31 it suffices to consider arbitrary u,v € S* with r E Rn{u) and 
r ^ Rn{v), and using Fact 12.11 it suffices to show that Samson wins the game FO^{u,v). If 
ri, the shortest prefix ranker of r, is not defined over v, the letter referred to by ri occurs 
in u but does not occur in v. Thus Samson easily wins in one move. 
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Otherwise we let = (pi, . . . be the shortest prefix 
ranker of r that is undefined over v. Thus rj_i is defined 
over both words. Without loss of generality we assume that 
Pi = <ia- This situation is illustrated in Fig. [3l Notice that 
V does not contain any a's to the left of rj_i(u), otherwise 
Tj would be defined over v. Samson places x in u on ri{u), 
and Delilah has to reply with a position right of or equal to 
ri^i{v). Now Lemma 13.51 applies and Samson wins in z — 1 
more moves. □ Figure 3: ri{v) is undefined 

Lemma 3.7 (position of a ranker). Let n be a positive integer and let r € Rn- There is a 
formula ipr G F0^[<] such that for all w T,* and for all i € [1, {w,i) \= ipr *^==^ 
i = r{w). 

Proof. As in the proof of Lemma 13.61 it suffices to show that for arbitrary n, v € S*, 
Samson wins the game FO^('u, v) where initially = r{u) and x'" ^ r{v). If r{v) is defined 
over f, then we can apply Lemma 13.51 immediately to get the desired strategy for Samson. 
Otherwise we use the strategy from Lemma 13.61 □ 

Theorem 3.8 (structure of F0^[<]). Let u and v be finite words, and let n € N. The 

following two conditions are equivalent. 

(i) (a) Rn{u) = Rn{v), and, 

(b) for all r G Rni^) and r' G i?*_^(n), ord{r{u),r'{u)) = ord{r(v),r'{v)) 

(ii) u=lv 

Notice that condition (i)(a) is equivalent to Rn{u) = R^iv). Instead of proving Theorem 
13.81 directly, we prove the following more general version on words with two interpreted 
variables. 

Theorem 3.9. Let u and v be finite words, let ii,Z2 S [li I""!]; l^t Ji,j2 S [1; \v\], and let 
n E N. The following two conditions are equivalent. 

(i) (a) Rniu) = Rniv), and, 

(b) for all r € Rniu) and r' € i?*_;^(n), ord{r{u),r' {u)) = ord{r{v),r' {v)), and, 

(c) {u,ii,i2) =1 {v,ji,j2), and, 

(d) for all r E R^{u), ord{ii,r{u)) = ord{ji,r{v)) and ord{i2,r{u)) = ord{j2,r{v)) 

(ii) {u,ii,i2) =1 {v, 31,32) 

Proof. For n = 0, (i)(a), (i)(b) and (i)(d) are vacuous, and (i)(c) is equivalent to (ii). For 
n > 1, we prove the two implications individually using induction on n. 

We first show "-■(i) =^ -i(ii)". Assuming that (i) holds for n G N but fails for n + 1, we 
show that {u,ii,i2) ^n+i {^^ji^h) by giving a winning strategy for Samson in the FO^^;^ 
game on the two structures. If (i)(c) does not hold, then Samson wins immediately. If (i)(d) 
does not hold for n + 1, then Samson wins by Lemma [3.51 If (i)(a) or (i)(b) do not hold for 
n + 1, then one of the following three cases applies. 

(1) There is an {n + l)-ranker that is defined over one word but not over the other. 

(2) There are two n-rankers that do not agree on their ordering in u and v. 

(3) There is an (n + l)-ranker that does not appear in the same order on both structures 
with respect to a /c-ranker where k < n. 

We first look at case (2) where there are two rankers r,r' € Rn{u) that disagree on 
their ordering in u and v. Without loss of generality we assume that r{u) < r'[u) and 
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r{v) > r'{v), and present a winning strategy for Samson in the FO^_^_i game. In the first 
move he places x on r(u) in u. Dehlah has to reply with r(v) in v, otherwise she would lose 
the remaining n-move game as shown in Lemma 13.51 Let r'^_i be the (n — l)-prefix-ranker 
of r'. We look at two different cases depending on the ordering of and r' . 

For r'^_i{u) < r'{u), the situation is illus- 
trated in Fig. m In his second move, Samson 
places y on r'{v). Delilah has to reply with a 
position to the left of x", but she cannot choose 
any position from the interval {r'^_i{u),r' {u)) 
because it does not contain the letter v. 



yU 




Figure 4: Two n-rankers appear in differ- 
ent order and r' ends with \>. 
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Figure 5: Two n-rankers appear in differ- 
ent order and r' ends with <l. 



So 

she has to reply with a position left of or equal 
to r'^_i{u), and Samson wins the remaining 
F0^_]^ game as shown in Lemma 13.51 

For r'^_i{u) > r'{u), the situation is illus- 
trated in Fig. [5j In his second move, Samson 
places pebble y on r'(u), and Delilah has to 
reply with a position to the right of x'" , but 
she cannot choose anything from the interval 
{r' (v) , r'j^_i{v)) because this section does not 
contain the letter n^u. Thus she has to reply 
with a position right of or equal to r'^_i{v), 
and Samson wins the remaining F0^_]^ game 
as shown in Lemma 13.51 

Now we look at cases (1) and (3), assuming that 
case (2) does not apply. We know that condition 
(i) from the statement of the theorem fails, but still 
all n-rankers agree on their ordering. In both case 
(1) and case (3), there are two consecutive n-rankers 
r, r' G Rn{u) with r[u) < r'{u) and a letter a E S such 
that without loss of generality a occurs in the segment 
'"{(r{u),r'(M)) but not in the segment V(^r(v),T'(v))- We 
describe a winning strategy for Samson in the game 
FOl^_^i{u,v). He places x on an a in the segment 
{r(u),r'{u)) of u, as shown in Fig. [6l Delilah can- 
not reply with anything in the interval {r(v),r' {v)). If she replies with a position left of or 
equal to r(v), then x is on different sides of the n-ranker r in the two words. Thus Lemma 
13.51 applies and Samson wins the remaining n-move game. If Delilah replies with a position 
right of or equal to r'{v), then we can apply Lemma 13.51 to r' and get a winning strategy 
for the remaining game as well. This concludes the proof of "-■(i) =^ -1(11)". 

To show "(i) =^ (ii)", we assume (i) for n-|-l, and present a winning strategy for Delilah 
in the F0^_|_| game on the two structures. In his first move Samson picks up one of the two 
pebbles, and places it on a new position. Without loss of generality we assume that Samson 
picks up X and places it on u in his first move. If x" = r(n) for any ranker r G R!^_^^(u), 
then Delilah replies with = r{v). This establishes (i)(c) and (i)(d) for n, and thus Delilah 
has a winning strategy for the remaining FO^ game by induction. 




Figure 6: A letter a occurs between 
n-rankers r, r' in u but not in v 
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If Samson does not place x" on any ranker from R!^_^_i{u), then we look at the closest 
rankers from i?* (u) to the left and right of x", denoted by A and p, respectively. Let a := Uxu 
and define the (n + l)-ranker s = (A,>a)- On u we have X{u) < s{u) < p{u). Because of 
(i)(a) s is defined on v as well, and because of (i)(b), we have \{y) < s{v) < p{v). If is 
not contained in the interval {X{u),p{u)), then Delilah places x on s{v), which establishes 
(i)(c) and (i)(d) for n. Thus by induction Delilah has a winning strategy for the remaining 
FO^ game. 

If both pebbles x" and occur in the in- 
terval {X{u),p{u)), then we need to be more 
careful. Without loss of generality we assume 
yU ^ illustrated in Fig. [71 Thus Delilah 

has to place x in the interval {y'" , p{v)) and at 
a position with letter a := n^u. We define the 
n+ 1-ranker s = (/9,0a)- From (i)(d) we know 
that s appears on the same side of y in both 
structures, thus we have y"" < s{v) < p{v). Figure 7: x and y are in the same section 
Delilah places her pebble x on and thus 

establishes (i)(c) and (i)(d) for n. By induction, Delilah has a winning strategy for the 
remaining FO^ game. □ 

A fundamental property of an n-ranker is that it uniquely describes a position in a 
given word. Now we show that the converse holds as well: Any position in a word that can 
be uniquely described with an F0^[<] formula can also be described by a ranker (Lemma 
13. lip . Furthermore, any F0'^[<] formula that describes a unique position in any given word 
is equivalent to a boolean combination of rankers (Theorem 13. 12p . 

Definition 3.10 (unique position formula). A formula ip € F0^[<] with x as a free variable 
is a unique position formula if for all G there is at most one i G [1, \w\\ such that 
{w,i) ^ Lp. 

Lemma 3.11. Let n he a positive integer and let E FOf^[<] he a unique position formula. 
Let u ^Ti* and let i G [1, |n|] such that {u,i) \= ip. Then i = r[u) for some ranker r G i?*. 

Proof. Suppose for the sake of a contradiction that there is no ranker r G Rn such that 
{u,i) \= ipr. Because the first and last positions in u are described by 1-rankers, we know 
that i ^ {1, |tt|}. We construct a new word v by doubling the symbol at position i in u, 
V = ui . . . Ui-iUiUiUi+i . . . u^u^. By assumption, there is no n-ranker that describes position 
i in u. A brief argument by contradiction shows that there are also no n-rankers that 
describe positions i or i + 1 in v: Assuming that such a ranker exists, let r be the shortest 
such ranker. Thus none of the prefix rankers of r point to either positions i or i + 1 in v. 
This means that all prefix rankers of r are interpreted in exactly the same way on both 
u and V, and irrespective of whether r{v) points to i or z + 1, we have have r{u) = i, a 
contradiction. Hence all n-rankers are insensitive to the doubling of Ui, and the two words 
u and V agree on the definedness of all n-rankers and on their ordering. By Theorem 13.91 
we thus have {u,i) =^ iv,i) =^ {v,i + 1), which contradicts the fact that 99 is a unique 
position formula. □ 

Theorem 3.12. Let n be a positive integer and let ip G F0^[<] he a unique position 
formula. There is a k & N, and there are mutually exclusive formulas ai G F0^[<] and 
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rankers ri € R!^ such that 

f=\/ (aiAfr,) 

ie[l,k] 

where ip^ S F0^[<] is the formula from Lemma 3.7 that uniquely describes the ranker r^. 

Proof. Let T be the set of all F0^[<] types of words over S with one interpreted variable. 
Because there are only finitely many inequivalent formulas in F0^[<], T is finite. Let 
T' C T be the set of all types that satisfy (p. We set T' = {Ti, . . . , T^} and let € F0^[<] 
be a description of type Tj. Thus if = Vie[i k] ^i- 

Now suppose that {u,j) \= ip. Thus (u, j) |= Oj for some i. By Lemma rs.lll iu. j) \= p>r^ 
for some rj G i?*. Thus since (pr^ G FO^ and is a complete FO^ formula. Thus 

ai= ai A Pn so p is in the desired form. □ 



4. Alternation hierarchy for F0^[<] 

We define alternation rankers and prove our structure theorem (Theorem 14. 5p for 
F0^ ,^[<]. Surprisingly the number of alternating blocks of < and i> in the rankers cor- 
responds exactly to the number of alternating quantifier blocks. The main ideas from our 
proof of Theorem 13.81 still apply here, but keeping track of the number of alternations does 
add complications. 

Definition 4.1 (m-alternation n-ranker). Let m, n G N with m < n. An m-alternation 
n-ranker, or (m, n)-ranker, is an n-ranker with exactly m blocks of boundary positions that 
alternate between > and <i. 

We use the following notation for alternation rankers. 

Rm,n{w) '■= {r \r is an m-alternation n-ranker and defined over the word w} 
Rm>,niw) := {r G Rm,n{w) \ r ends with >} 

Kn,n{w) := IJ Ri,j{w) 

iG[l,m]jG[l,n] 

«e[i,n] 

Lemma 4.2. Let m and n he positive integers with m < n, let u,v S*, and let r G 
Rm,niu) n Rm,niv). Samson wins the game FO^^^{u,v) where initially ord{r{u),x^) ^ 
ord(r{v),x'"). 

Furthermore, Samson can start the game with a move on u if r ends with \>, r{u) < 
and r{v) > x" , or if r ends with 0, r{u) > x^ and r{v) < x"" . He can start the game with a 
move on V if r ends with \>, r{u) > x" and r{v) < x"" , or if r ends with <, r{u) < x" and 
r{v) > x" . 

Proof. If m = n = 1, then we can immediately apply the base case from the proof of Lemma 
13. 5[ Samson wins in one move, placing his pebble on u or u as specified. 

For the remaining cases, we assume without loss of generality that r ends with > and 
that > r[u) and x"" < r{v). Let r„_i be the (n — l)-prefix ranker of r. This situation is 
illustrated in Fig. [T]of Lemma 13.51 Samson places y on r{u), and creates a situation where 
> ^n-i('u) and < r„_i(t>). If r„_i ends with <i, then by induction Samson wins the 
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remaining F0^_^ game and thus he has a winning strategy for the FO^ „ game. If 
r„_i ends with 0, then by induction Samson wins the remaining F0^„_]^ game starting 
with a move on u, and thus he has a winning strategy for the FO^ ^ game. □ 

Lemma 4.3. Let m and n be positive integers with m < n and let r € Rm,n- There is a 
(fr G FO^^n[<] such that for all w G T,* , w \= fr *^=^ r G Rm,n{w). 

Proof. Using Lemma 12.31 it suffices to consider arbitrary u,v G with r G Rm,n{u) and 
r ^ Rm,n{v), and using Fact 12.11 it suffices to show that Samson wins the game FO^ „(n, v). 
Let Tj = (pi , . . . ,pi)he the shortest prefix ranker of r that is undefined over v, and we assume 
without loss of generahty that this ranker ends with the boundary position pi = Oa for some 
a G S. This situation is ihustrated in Fig. [3]for Lemma [3. 7i In his first move Samson places 
X on rj(n) and thus forces a situation where < rj_i(ti) and x"" > rj_i(t;). If rj_i ends 
with <i, then according to Lemma 14.21 Samson wins the remaining F0^„_^ game starting 
with a move on u. Otherwise rj_i ends with >, and thus by Lemma 14.21 Samson wins the 
remaining F0^_^ game starting with a move on v. □ 

Lemma 4.4. Let m and n he positive integers with m < n and let r G Rm,n- There is a 
formula tpr G F0^^[<] such that for all tt) G S* and for all i G [1, \w\], {w,i) \= Tpr 

i = r{w). 

Proof. As in the proof of Lemma 14. 3^ it suffices to show that Samson wins the game 
FOf^j^{u,v) where initially = r{u) and x" ^ r{v). Depending on whether r is de- 
fined over V, we use the strategies from Lemma 14.21 or Lemma 14.31 □ 

Theorem 4.5 (structure of FO^ «[<])• Let u and v be finite words, and let m, n G N with 
m < n. The following two conditions are equivalent. 

(i) (a) Rm.,n{u) = Rm,n{v), and, 

(b) for all r G R!^n{u) and for all r' G Rm~i n-i(^); have 
ord(r{u),r' {u)) = ord{r{v),r'{v)), and, 

(c) for all r G R^^{u) and r' G R^^_i{u) such that r and r' end with different 
directions, ord{r{u),r'{u)) = ord{r{v),r'{v)) 

(ii) u EE^_„ V 

Just as before with Theorem 13.81 instead of proving Theorem 14.51 directly, we prove a 
more general version that applies to words with two interpreted variables. The statement 
of the general version is asymmetric with respect to the roles of the two structures u and 
V. This is necessary because of the correspondence between quantifier alternations (i.e. 
alternations between u and v in the game) and alternations of directions in the rankers. 
This asymmetry already affected the statement of Lemma 14.21 where Samson's winning 
strategy starts with a move on the specified structure. In fact, as the proof of the following 
theorem shows, he does not have a winning strategy that starts with a move on the other 
structure. We remark that conditions (i)(a) through (i)(e) of the general theorem are 
completely symmetric with respect to the roles of u and v, and only conditions (i)(f) and 
(ii) are asymmetric. Theorem 14.51 follows directly from the general theorem, since here 

ii = i2 = ji = J2 = 1) thus conditions (i)(e) and (i)(f) or trivially true, and the equivalence 
holds with the roles of u and v reversed as well. 
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Theorem 4.6. Let u and v be finite words, let ii,i2 G [1, \u\], let Ji,j2 G [!> I^l]? '^''^d let 
m,n E N with m <n. The following two conditions are equivalent. 

(i) (a) Rm,n{u) = Rm,n{v), and, 

(b) for all r E R!^ ^(n) and for all r' E ^J^-i n-i(^)> ''^^ have 
ord{r(u),r'{u)) = ord{r{v),r'{v)), and, 

(c) for all r E R^^iu) and r' E R^^_i{u) such that r and r' end with different 
directions, ord{r{u),r'{u)) = ord{r{v),r'{v)) 

(d) (n,ii,Z2) =0 {v,ji,j2), and, 

(e) forallr E R^_i^{u), ord{r{u),ii) = ord{r{v), ji) and ord{r{u),i2) = ord{r{v),j2), 
and, 

(f) for all r E i?m,n(^); <^nd {i,j) E {(ii,ii), (^2, J2)}, 
(fi) if r ends on > and r{u) = i, then r{v) < j 
(£2) if r ends on > and r{u) < i, then r{v) < j 
(£3) if r ends on <l and r{u) = i, then r{v) > j 
(£4) if r ends on < and r[u) > i, then r{v) > j 

(ii) Delilah wins the game FO^ „/<y((n, ii, 12), {v,ji,j2)) if Samson starts with a move on 
{u,ii,i2). 

Proof. As in the proo£ o£ Theorem 13.81 we use induction on n. For n = 0, condition (i)(d) 
just by itsel£ is equivalent to (ii), and ah other conditions o£ (i) are vacuous. For n > 1, we 
we first show (i) =^ (ii)". 

Suppose that (i) holds £or {m,n), but £ails £or (m, n + 1). I£ (i)(d) does not hold then 
Samson wins immediately. If (i)(e) does not hold £or (m, ra+1), then by Lemma [4.2l Samson 
wins the {m,n + l)-game on {u,v), starting with a move on either u or v. If Samson can 
start with a move on u, we have established that (ii) is false. Otherwise, we reverse the 
roles of u and v, and observe that condition (i)(e) still remains the same. Thus, even if 
Samson needs to start with a move on v, he still has a winning strategy, and (ii) does not 
hold for (m, n + 1). If (i)(f) does not hold for (m, n + 1), then again by using Lemma 14.21 
Samson wins the (m, n + l)-game on (n, v) starting with a move on u. 

If one of (i)(a), (i)(b) or (i)(c) fail, then we show that Samson has a winning strategy 
for the game FO^ „_|_i(n, u). We observe that it does not matter what structure Samson 
chooses for his first move, since all of (i)(a), (i)(b) and (i)(c) are completely symmetric with 
respect to the roles of u and v. Thus if Samson's winning strategy starts with a move on 
V, we can reverse the roles of u and v and get a winning strategy starting with move on u. 
One of the following cases applies. 

(1) There is a ranker r E Rm,n+i that is defined over one structure but not over the other. 
This first case applies if (a) fails for {m,n + 1). If condition (2) fails for {m,n + 1), then 
there are two n-rankers for which it fails, or an (n + l)-ranker and an n-ranker. This leads 
to the following two cases. 

(2) There are two rankers r E Rm,n{u) and r' E Rm-i,n{u) that disagree on their order, i.e. 
ovd{r{u),r' {u)) 7^ ord(r(u), r'(i;)). 

(3) There are two rankers r E Rm,n+i{'a) and r' E Rm-i,n{'a) that disagree on their order. 
In a similar fashion, we obtain the remaining two cases if condition (3) fails for (m, n + 1). 

(4) There are rankers r, r' E Rm,n{u) that end on different directions and disagree on their 
order. 
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(5) There are rankers r G Rm,n+i{u) and r' G Rm,n{u) that end on different directions and 
disagree on their order. 

We look at the cases (2) and (4) first, then deal with case (1) assuming that cases (2) 
and (4) do not apply, and finally look at cases (3) and (5). 

For case (2), we assume that r{u) < r'(u), as illustrated in Fig. [8l The situation for 
r(n) > r'(n) is completely symmetric. Depending on the last boundary position of r, one 
of the following two subcases applies. 

• r ends with >. Samson places x on r{u) in his first move. 
If Delilah replies with a position to the left of r'(v) or 
equal to r'{v), then x"" < r{v). Thus we can apply Lemma 
l4.2l to get a winning strategy for Samson in the remaining 
FO^ n game that starts with a move on u. If Delilah 
replies with a position to the right of r'(v), Samson has a 
winning strategy for the remaining FO^_i „ game. Thus p^g^j-e 8: r and r' appear in 
we have a winning strategy for Samson in the FO^ different order 

game. 

• r ends with <. This is similar to the previous case, but now Samson places x on r{v) in 
his first move. If Delilah replies with a position to the right of r'(u), or equal to r'(u), 
then as above we get a winning strategy for Samson in the remaining FO^ „ game that 
starts with a move on v. Otherwise we get a winning strategy for Samson with only 
m — 1 alternations for the remaining game. Thus again he has a winning strategy for the 
FOm,n+i game. 

For case (4), Samson's winning strategy is very similar to the previous case. If r(u) < 
r'(u) and r ends with 0, then Samson places x on r(u) in his first move. If Delilah replies 
with a position to the right of r(u), then Samson's winning strategy is as above. Otherwise 
X is on different sides of r' and Samson has a winning strategy for the remaining FO^ „ 
game that starts with a move on u. All together, he has a winning strategy for the FO^ 
game. The remaining three cases (ordering of r{u) and r'(n) and ending direction of r) 
work in the same way. 

Similar to what we did in the proof of Theorem 13.81 we can reduce the remaining cases 
to an easier situation where a certain segment contains a certain letter in one structure, 
but not in the other structure, and then apply Lemma 14.21 to obtain a winning strategy for 
Samson. 

To deal with case (1), we assume that the previous two cases, (2) and (4), do not apply. 
Without loss of generality, say that the (m, n + l)-ranker r is defined over u but not over v. 
Let a :— be the letter in u at position r(u). We define the following sets of rankers. 

Re := {s G R^c^^niu) I s{u) < r{u)} 

Rr := {s G Rl,^^n{u) I s{u) > r(n)} 

Notice that all rankers from R^ appear to the left of all rankers from Rj. in u. From 
the inductive hypothesis, and from the fact that both cases (2) and (4) do not apply, it 
follows that over all rankers from i?£ appear to the left of all rankers from R^- as well. 
However, the rankers from Ri and Rr by themselves do not necessarily appear in the same 
order in both structures. We look at the ordering of these rankers in v, and let A be the 
rightmost ranker from Ri and p be the leftmost ranker from Ry. By construction, we have 
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X{u) < r{u) < p{u), SO the segment (A,p) in u contains the letter a. Let be the n-prefix- 
r anker of r, and observe that r„ is defined on both structures and that r„ is contained in 
either Ri or i?^. Because r is not defined on v, the letter a does not occur in v either to 
the right of r„ if r„ G Ri, or to the left of r„ if r„ G Rr. Thus the segment (A, p) does not 
contain the letter a in w. 

Now we know that a occurs in the segment (A, p) 
in u but not in v, and thus we have established the 
situation illustrated in Fig. [9j Samson places his first 
pebble on an a within this section of u, and Delilah has 
to reply with a position outside of this section. No mat- 
ter what side of the segment she chooses, with Lemma 
14.21 Samson has a winning strategy for the remaining 
game and thus wins the FO^ game. 

In cases (3) and (5), we again assume that cases Figure 9: A letter occurs between 
(2) and (4) do not apply, and we look at the same sets rankers r, r' in u but not in v 
of rankers, Ri and Rr, and at r„, the n-prefix-ranker of 

r. We assume that r(n) < r'(n) and that r ends with i>, all three other cases are completely 
symmetric. Notice that r„ is an (m — l,n)-ranker, or an (m, n)-ranker that ends with >. 
Thus both structures agree on the ordering of r„ and r'. The relative positions of all these 
rankers are illustrated in Fig. [TOl As above, let A be the rightmost ranker from Rg and let p 
be the leftmost ranker from R^, with respect to the ordering of these rankers on v. Again we 
know that A(u) < r{u) < p{u) and therefore the segment (A, p) of u contains an a. Notice 
that r„ G R^ and r' G Rr, thus rn{v) < \{v) < p{v) < r'{v). Thus the segment (A,p) does 
not contain the letter a in providing Samson with a winning strategy as argued above. 

To prove "(i) =^ (ii)", we assume that the the- 
orem holds for n, and that (i) holds for (m, n + 1), 
and we present a winning strategy for Delilah in 
the game FO^„^;^(n, where Samson starts with 
a move on u. 

If Samson places x on a ranker r G i?J^_]^ „(n), 
then Dehlah rephes by placing x on the same ranker Figure 10: Ranker positions, case (4) 
on V. Since (i)(b) holds for {m,n + 1), this estab- 
lishes (i)(e) and (i)(f) for {m,n). It also establishes (i)(e) and (i)(f) for (m — l,n) with 
reversed roles of u and v. Thus we can apply the inductive hypothesis to get a winning 
strategy for Delilah in the remaining game. 

If x" = after Samson's first move, then Delilah replies with x^ = y'" . We use the 
inductive hypothesis to argue that Delilah wins the remaining n-move game, no matter 
what structure Samson chooses for his next move. If he chooses to play on u, then the 
remaining game is an (m, n)-game. Since in the first move Delilah set x'" = y'" , we have 
(i)(e) and (i)(f) for {m,n), and thus the inductive hypothesis applies and Delilah wins the 
remaining game. On the other hand, if Samson chooses to play on v for the next move, the 
remaining game is an (m — 1, n)-game, since he started with a move on u. Because Delilah 
set x'" = y^ in the first move, (i)(e) for (m,n + l) implies both (i)(e) and (i)(f) for (m — 
with reversed roles of u and v. Thus we can again use the inductive hypothesis to get a 
winning strategy for Delilah in the remaining game. 
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Otherwise we assume that < after Samson's first move, the case for > y" is 
completely symmetric. We look at the following two sets of rankers. 

Rt := {r e R*^^^^{u) I r(n) < x"} 

Rr := {r G R*^^^^{u) I r(n) > x"} 

On n, all rankers from Ri occur to the left of all rankers from Rr- Since (i)(c) holds for 
(m, n + 1), this is also true for the positions of these rankers on v. Let a be the letter 
Samson places his pebble on. To establish both (i)(e) and (i)(f) for (m,n), Delilah needs to 
find an a in v that is to the right of all rankers from R^ and to the left of all rankers from 
Rr- We define 

= {r^ ^m<l,n(^) - ^m-l,n(^) 1 r{u) = x"} 

R'^ ■= {r>a I r G i?4 U i?^ 

and have Delilah place her pebble x'" on the rightmost ranker from R'^ on v. This position 
of course is labeled with an a. Since on u all rankers from i?^ occur to the left of or at x", 
all of them occur strictly to the left of y". Since all rankers in R'^ are from R^-i n+i(^) oi' 
Rmr> n+i('")) '^^^ apply and (i)(f2), and we see that all of these rankers also appear 

to the left of y^. Therefore we have x" < y", which makes sure that Delilah does not lose 
in this move, and also establishes (i)(d). 

To complete the inductive step, we need to argue that Delilah's move also establishes 
(i)(e) and (i)(f), both for (m, n), and for (m — l,n) with reversed roles of u and v. Then, 
using the inductive hypothesis, Delilah has a winning strategy for the remaining game, no 
matter what side Samson chooses for his next move. 

We observe that all rankers from R'^ appear to the right of the rankers from R^. This is 
true by definition on u, and holds for v because (i)(b) and (i)(c) hold for (m, n + 1). Since 
Delilah placed x^ on a ranker from R'^, we have (i)(e), (i)(f2) and (i)(f4) for (m, n) for all 
all rankers from Rr. And since Delilah placed x'" on the rightmost of the rankers from i?^, 
we know that all rankers from Rg appear to the left of x'" , just as they do on u. Thus we 
have (i)(e), (i)(f2) and (i)(f4) for the rankers from R£ as well, and therefore for all rankers 
mentioned in those conditions. 

All rankers from R^^ ^ that appear at x" are in R^ , since we already dealt with the 
case where x" does appear at a ranker from R^_i „. Since Delilah chose x'" as the rightmost 
ranker from R'^, all of these rankers appear to the left of or at x^, and we have established 
(i)(fi) for {m,n). For condition (i)(f3), we need to argue about R^- From (i)(b) and (i)(c) 
for {m,n + 1), we know that all rankers from R^ appear to the right of or at the same 
position as the rankers from R'^ on v, just as they do on u. Thus (i)(f3) holds as well. 

Now that we have established (i) for {m,n), we use the inductive hypothesis to get a 
winning strategy for Delilah for the remaining game if Samson's next move is on u. For 
the case where his next move is on v, we only need to establish (i) for (m — 1, n), but with 
reversed roles of u and v. Reversing the roles of the two structures only affects condition 
(i)(f), and (i)(f) for (m — l,n) follows immediately from (i)(e) for (m,n). Thus Delilah also 
wins the remaining game if Samson's next move is on v. □ 

Using Theorem 14. 5t we show that for any fixed alphabet S, at most |$]| + 1 alternations 
are useful. Intuitively, each boundary position in a ranker says that a certain letter does 
not occur in some part of a word. Alternations are only useful if they visit one of these 
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previous parts again. Once we visited one part of a word times, this part cannot contain 
any more letters and thus is empty. 

Theorem 4.7. Let T, be a finite alphabet, let u,v G 'E* and n G N. If u =^2|+i n 
u =1 V. 

Proof. Suppose for the sake of a contradiction that u =^2|+in ^ ^ ^' Thus, using 
Theorem 14. 5| u and v agree on the definedness of all (|S| + 1, n)-rankers, and on their order 
with respect to all (|S|,n — l)-rankers and some (|S| + l,n — l)-rankers. But since u v, 
u and V need to disagree on the properties of some other ranker. Let r = {pi, . . . ,pt) with 
t G N be the shortest such ranker. We know that r has more than |S| blocks of alternating 
directions, say r is an m-alternation ranker for some m > \T,\. Let 1 < ki, . . . , km < t he 
the indices of the boundary positions at the end of each block, i.e. where p^., 1 < i < m 
points to a different direction than Pk^+i- For the last of those indices we have km = t. 

We look at the prefix rankers of r up to the end of each alternating block, r^- := 
(pi, . . . and the intervals defined by these prefix rankers. We set /o(^^) '■= [Ij^l]) 

ro(n) = if pi points to the right, and ro(M) = |n| + 1 if pi points to the left. For all 
i G [1, m] let, 

j.(^) I [^k,~i{u) + 1, rfc,(n) - 1] if pk^ points to the right 
\ [^k,{u) + 1' rk,~i{u) - 1] if Pk, points to the left 

Notice that by definition the letter mentioned in p^. does not occur in the interval /j. 

Suppose that for all i G we have rk^{u) G /i_i(n). Then the letter mentioned in 

Pfc. has to occur in the interval Ii-i{u) of n, but the interval of u cannot contain any 

of the |S| distinct letters. Therefore ^^^ij^i^^ ^ -^|s| we have a contradiction. 

Otherwise there is an i G such that rk^{u) ^ We will construct a ranker 

r' that is shorter than r, does not have more alternations than r and occurs at exactly 
the same position as r in both u and v. The main idea for this construction is that if 
Tkiiu) ^ Ii-i{u), then it is not useful to enter this interval at all. By our assumption, u 
and V disagree on some property of the ranker r, and thus on some property of the shorter 
ranker r' . This contradicts our assumption that r was the shortest such ranker. 

Now we show how to construct a shorter ranker r' that occurs at the same position 
as r. We assume without loss of generality that p^. points to the left. In this case we 
have rfc.(it) ^ /j_i(n) = + l,rfc^_^(n) — 1]. We look at the relative positions 

of the rankers rk-_-^+i, . . . ,rfc. with respect to the ranker We know that rk-{u) < 

^ki-i-i{u), and we are interested in the right-most of the rankers rfc._j_|_i, . . . , r^. that is still 
outside of the interval Ii-i{u). Let rj be this ranker. Thus we have 

rk^{u) < ... < rj{u) < rfc^„j__i(n) < rj_i(n) < . . . < rfe,_j+i(n) < rk^_^{u) 

We know that u =^j-,|_|_^^ v, thus by Theorem 14.51 these rankers occur in exactly the same 
order in v. Now we set s := (rfc^_-^_i,pj, . . . Because u and v agree on the ordering of 

the relevant rankers, we have s{u) = rk^^u) and s{v) = rk^(v). Therefore we have reduced 
the size of a prefix of r without increasing the number of alternations, and thus have a 
shorter ranker r' that occurs at the same position as r in both structures. □ 

In order to prove that the alternation hierarchy for FO^ is strict, we define example 
languages that can be separated by a formula of a given alternation depth m, but that 
cannot be separated by any formula of lower alternation depth. As Theorem 14.71 shows, we 
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need to increase the size of the alphabet with increasing alternation depth. We inductively 
define the example words Um,n and Vm,n and the example languages Km and Lm over finite 
alphabets = {ag, . . . , am-i}- Here i, m and n are positive integers. 



Ul,n 
U2,n 
U'2i+l,n 



\2n 



ao(,aiaoJ 

(ao . . . a2i)" U2i,n 
U2i+l,n {^2i+l • • • ao) 

Notice that Um,n and Vm,n are almost identical - if we delete only one ao from Um,n-, we get 
Vm,n- Finally, we set Km := Un>i{'"m,n} and Lm ■= Un>i{^"i,",}- 



V2,n 
V2i+l,n 
V2i+2,n 



\2n 



iaiaoj 

(ao • • • a2i)" f2i,n 
V2i+l,n (a2j+l • • • ao)" 



Definition 4.8. A formula separates two languages K,L QT,* if for all w (z K we have 
w \= (f and for all w G L we have w ^ or vice versa. 

Lemma 4.9. For all m G N, there is a formula (pm £ FO^ [<]-ALT[m] that separates Km 
and Lm- 

Proof. For m = 1, we can easily separate Ki = {ao} and Li = {e} with the formula 
3x{x = x). For all larger m, we show that the two languages Km and Lm differ on the 
ordering of two (m — l)-alternation rankers. Then by Theorem 14.51 there is an F0^„[<] 
formula that separates Km and Lm- We inductively define the rankers 



r2 
r2i+i 
r2i+2 



ao 



r2i 



ai 



S2i 



l>a2i+iS2i+l 



•S2 
S2i+1 

>a2i+i^2j+l S2i+2 

For m = 2, it is easy to see that r2(n2,n) < S2{u2,n), but r2(t'2,n) > •S2(f2,n)- For m > 2, 
these rankers disagree on their order as well. To prove this, we prove the following two 
equalities. 

r2i+2iu2i+2,n) = r2i+l{u2i+l,n) = (2i + l)n + r2i{u2i,n) 

To prove this, we first use the definitions above and write 

r2i+2{u2i+2,n) = (>a2,+ i ''2i+l ) ('"2i+l,n (a2i+l • • • ao)"") 

The letter a2i+i does not occur in the word M2j+i,n, and thus I>a2i+i (^^2i+2,n) points to the 
first position in ii2i+2,n right after the copy of ii2j+i,n- We observe that r2i+i starts with 
<i, and that r2i+i is defined on U2i+i^n- Thus the evaluation of the remainder of r2i+2 on 
U2i+2,n never leaves the copy of ii2j+i,n) and we have 

r2i+2{u2i+2,n) = ?^2j+l (U2i+l,n) 

For the second part of the equality, we have 

r2i+l{u2i+l,n) = (<la2,r2i)((ao • • • a2j)"' U2i,n) 

As above, the letter a2i does not occur in the word U2i,n, and thus <ia2i (^2i+i,n) points to 
the position in «2i+i,n right before the copy of M2i,n- The ranker r2i starts with >, and r2i 
is defined on U2ijn- Thus, just as above, the evaluation of the remainder of r2j+i on 1(22+1, n 
never leaves the copy of ti2i,n, and we have 

r2i+l{u2i+l,n) = (2i + l)n + r2i{u2i,n) 
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Exactly the same holds for the other rankers {s2, ■ ■ ■) and words (t'2,n, • • •)• We have 

r2i+2{u2i+2,n) = r2i+l{u2i+l,n) = (2? + l)n + r2i{u2i,n) 
S2i+2{u2i+2,n) = S2i+l{u2i+l,n) = (2i + l)n + S2i{u2i,n) 
r2i+2iv2i+2,n) = r2i+l{v2i+l,n) = (2? + l)n + r2i{v2i,n) 
S2i+2{v2i+2,n) = S2i+l{v2i+l,n) = (2i + l)n + S2i(f2i,n) 

Now an easy inductive argument, based on the two equalities we just proved, shows 
that the rankers disagree on their order. Therefore condition (i)(b) of Theorem 14.51 fails for 
any pair of words, and there is a formula in F0^„[<] that separates Km and Lm- D 

Lemma 4.10. For m E N, m > 1, and all n € N, we have Um,n =m-i n ^m,n- 

Proof. Because we do not have constants, there are no quantifier-free sentences. Thus 
FOq„[<] does not contain any formulas and the statement holds trivially for m = 1. 

For m > 2 and any n > m, we claim that exactly the same (m — l,n)-rankers are 
defined over Um,n and Vm,m and that all (m — l,n)-rankers appear in the same order with 
respect to all {m — 2,n — l)-rankers and all (m — l,n — l)-rankers that end on a different 
direction. Once we established this claim, the lemma follows immediately from Theorem 
14. 5[ We already observed that Um,n and Vm,n are almost identical. The only difference 
between the two words is that Um,n contains the letter ag in the middle whereas Vm,n does 
not. Thus we only have to consider rankers that are affected by this middle ao. 

We claim that any ranker that points to the middle ag of Um,n requires at least m — 1 
alternations. Furthermore, we claim that any such ranker needs to start with > for even m 
and with <i for odd m. We prove this by induction on m. 

For m = 2 we have U2^n = £^0(^1 s-o)"- Any n-ranker that starts with < cannot reach the 
first ao, thus we need a ranker that starts with >. 

For odd m > 2 we have Um,n = (ao • • • ^m—i)^fJ"m—i,n- Any Ti-ranker that starts with t> 
cannot leave the first block of n - m symbols of this word and thus not reach the middle ao. 
Therefore we need to start with <i, and in fact use <ia„_i at some point, because we would 
not be able to leave the last section of Um-i,n otherwise. But with Oa^^i we move past all 
of 

"u-m— 1,71) and we need one alternation to turn around again. By induction, we need at 
least m — 2 alternations within Um~i,m and thus m — 1 alternations total. 

The argument for even m is completely symmetric. Thus we showed that we need at 
least m — 1 alternation blocks to point to the middle ao. Furthermore, we showed that if 
we have exactly m — 1 alternation blocks, then the last of these blocks uses i>. Therefore 
we only need to consider (m — l)-alternation rankers that end on and pass through the 
middle ag. It is easy to see that all of these rankers agree on their ordering with respect 
to all other (m — 2)-alternation rankers, and with respect to all (m — l)-alternation rankers 
that end on <i. 

To summarize, we showed that Um,n and Vm,n satisfy condition (i) from Theorem 14.51 
for m — 1 alternations. Thus the two words agree on all formulas from F0^_]^_„[<]. □ 

Theorem 4.11 (alternation hierarchy for F0^[<]). For any positive integer m, there is a 
ipm G FO^ [<]-ALT[m] and there are two languages Km,Ljn such that (fm separates Km 
and Lm, but no t/j ^ FO^ [<]~ALT[m — 1] separates Km and Lm- 

Proof. The theorem immediately follows from Lemma 14.91 and Lemma 14.101 □ 
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Theorem 14.111 resolves an open question from OH]. 

5. Structure Theorem and Alternation Hierarchy for FO^[<,Suc] 

We extend our definitions of boundary positions and rankers from Sect. [3] to include 
the substrings of a given length that occur immediately before and after the position of the 
ranker. 

Definition 5.1. A {k, i) -neighborhood boundary position denotes the first or last occurrence 
of a substring in a word. More precisely, a (k, ^)-neighborhood boundary position is of the 
form d(s,a,t) with d G {>,<]}, s G S'^, a G S and t G S^. The interpretation of a {k,i)- 
neighborhood boundary position p = d(s,a,t) on a word w = wi . . . w\^\ is defined as follows. 



p{w) 



min{i G [A; + 1, \ w\ — £] \ Wi-k ■ ■ ■ = sa.t} if d = > 
max{i £ [k + 1, \ w\ — £] \ Wi-k ■ ■ ■ wt+i = s at} if d = < 



Notice that piw) is undefined if the sequence sat does not occur in w. A {k, £)-neighborhood 
boundary position can also be specified with respect to a position q G [1, \ w\]. 



p{w,q) 



min{i G [max{g + 1, A; + 1}, \w\ — i] \ Wi^k ■ ■ ■ Wi+i = s at} if d = > 
max{i G [A; + 1, m.in{q — 1, \w\ — i}] \ Wi^k ■ ■ ■ Wi+l = s at} if d = <l 



Observe that (0, 0)-neighborhood boundary positions are identical to the boundary 
positions from Definition 13.11 As before in the case without successor, we build rankers 
out of these boundary positions. The size of the boundary position neighborhoods grows 
linearly from the first boundary position to the last one, reflecting the remaining quantifier 
depth for successor moves at those positions. 

Definition 5.2. An n- successor-ranker r is a sequence of n neighborhood boundary posi- 
tions, r = (pi, . . . ,Pn)) where pi is a (/cj, £j)-neighborhood boundary position and ki,li G 
[0, z — 1]. The interpretation of an n-successor-ranker r on a word w is defined as follows. 

pi{w) ifr={pi) 

r{w) := < undefined if (pi, . . . ^Pn-i){w) is undefined 

Pn{w, (pi, . . . ,pn^i){w)) otherwise 

We denote the set of all n-successor-rankers that are defined over a word w by SRn{w), and 

Because we now have the additional atomic relation Sue, we need to extend our defini- 
tion of order type as well. 

Definition 5.3. Let i,j G N. The successor order type of i and j is defined as 

< if i < i - 1 
-1 ifi = j-l 



ords(i, j) 



= ifi = j 
+1 ifi = j + l 
> if i > j -M 
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With this new definition of n-successor-rankers, our proofs for Lemmas 13. 5( 13.6 } 13.71 and 
Theorem 13.81 go through with only minor modifications. Instead of working through all the 
details again, we simply point out the differences. 

First we notice that 1-successor-rankers are simply 1-rankers, so the base case of all 
inductions remains unchanged. In the proofs of Lemmas 13.51 13.61 and 13.71 and in the proof 
of "(ii) =^ (i)" from Theorem 13.81 we argued that Delilah cannot reply with a position in a 
given section because it does not contain a certain ranker and therefore it does not contain 
the symbol used to define this ranker. Now we need to know more - we need to show that 
Delilah cannot reply with a certain letter in a given section that is surrounded by a specified 
neighborhood, given that this section does not contain the corresponding successor-ranker. 
Whenever Samson's winning strategy depends on the fact that an n-successor-ranker does 
not occur in a given section, he has n — 1 additional moves left. So if Delilah does not reply 
with a position with the same letter and the same neighborhood, Samson can point out a 
difference in the neighborhood with at most (n — 1) additional moves. 

For the other direction of Theorem [3?8l we need to make sure that Delilah can reply with 
a position that is contained in the correct interval, has the same symbol and is surrounded 
by the same neighborhood. Where we previously defined the n-ranker s := (A,[>a) or 
s := (/?, <ia), we now include the (n — l)-neighborhood of the respective positions chosen by 
Samson. Thus we make sure that Samson cannot point out a difference in the two words, 
and Delilah still has a winning strategy. Thus we have the following three theorems for 
F02[<,Suc]. 

Theorem 5.4 (structure of F0^[<, Sue]). Let u and v he finite words, and let n G N. The 

following two conditions are equivalent. 

(i) (a) SRn{u) = SRn{v), and, 

(b) for all r G SR^iu) and for all r' G SRn_i{u), 
ords{r{u),r' (u)) = ords{r{v),r' {v)) 

(ii) u=lv 

Theorem 5.5 (structure of FO^ „[<, Sue]). Let u and v he finite words, and let m, n G N 
with m < n. The following two conditions are equivalent. 

(i) (a) SRm,n{u) = SRm,n{v), and, 

(b) for all r G SR*^,^{u) and for all r' G SR'^^i 
ords{r{u),r' {u)) = ords{r{v),r'{v)), and, 

(c) for all r G SR^^{u) and r' G SRl^^_i{u) such that r and r' end with different 
directions, ords{r{u),r' {u)) = ords{r{v),r' {v)) 

(ii) u ='L,n V 

Theorem 5.6 (alternation hierarchy for FO^[<,Suc]). Let m be a positive integer. There 
is a ifm G FO^ [<, Sue]- A LT[m] and there are two languages Km,Lm ^ such that (fm 
separates Km and L^, hut there is no G F0^[<, Sue] -ALT[m — 1] that separates Km and 

Lm ■ 

Proof. We use the same ideas as before in Theorem l4.11[ We define example languages that 
now include an extra letter b to ensure that the successor predicate is of no use. As before, 
we inductively construct the words n^.n and Vm,n and use them to define the languages Km 
and Lm- 
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U2,n 
U2i+l,n 



b2"aob2" 



ui^n (aib aob j 

(b aob . . . b a2i) U2i-n 

U2i+i,n (a2i+ib^"' . . . aob^") 



Vl,n 
V2,n 
V2i+l,n 
V2i+2,n 



,2n 



vi,n (aib2"aob2")2" 

(b aob ...b a2ij f2i,n 

'y2i+i,n (a2i+ib^'" . . . aob^") 



Finally we set Km '■= Un>i{^»Ti,n} and Lm ■= Un>i{^m,n}- Notice that the bs are not 
necessary to distinguish between the two languages Km and Lm, and thus the proof of 
Lemma [4.91 goes through unchanged and we have a formula (fm £ F0^[<, Suc]-ALT[m] 
that separates Km and Lm- To see that no F0^[<, Suc]-ALT[m — 1] formula can separate 
Km and Lm, we observe that any (n — 1) -neighborhood in the words Um n and Vm n contains 
all bs except for at most one letter a^ for some i G [0, m — 1]. Thus the proof of Lemma 
14.101 goes through here as well. □ 



6. Small Models and Satisfiability for F0^[<] 

The complexity of satisfiability for F0^[<] was investigated in [4]. There it is shown 
that any satisfiable F0^[<] formula has a model of size at most exponential in n. It 
follows that satisfiability for F0^[<] is in NEXP, and a reduction from TILING shows that 
satisfiability for F0^[<] is NEXP-complete. Using our characterization of F0^[<], Wilke 
observed that satisfiability becomes NP-complete if we look at binary alphabets only [21J. 
We generalize this observation and show that satisfiability for F0^[<] is NP-complete for 
any fixed alphabet size. In contrast to this, satisfiability for FO^[<,Suc] is NEXP-complete 
even for binary alphabets [4], since in the presence of a successor predicate we can encode 
an arbitrary alphabet in binary. Before we state and prove the two theorems of this section, 
we prove a simple technical lemma first. 

Lemma 6.1. Let u,v,v',w S S*. If v =^ v' , then uvw =^ uv'w. 

Proof. We argue that Delilah has a winning strategy for the game FO'^{uv'w,uv''w): If 
Samson places a pebble in u or w, Delilah replies with the identical position in u or w in 
the other structure. If Samson places a pebble in v or v', Delilah replies according to her 
winning strategy in the game FO^{v, v'). All of these moves obviously preserve the ordering 
of the pebbles, and thus Delilah wins. □ 

Theorem 6.2 (Small Model Property for Bounded Alphabets). Let n S N and let ip G 
F0^[<] be a formula over a k-letter alphabet. If ip is satisfiable, then ip has a model of size 
0{n^). 

Proof. Let w be an arbitrary model of (p. We use induction on k to show how to construct 
a new model of size 0{n^) that satisfies ip. For k = 1, i.e. a single letter alphabet, we 
observe that an n-ranker can only point to a position within the first or last n letters of w. 
We let w' be a copy of w with all letters after the first n letters and before the last n letters 
removed. The words w and w' agree on the existence and ordering of all n-rankers, thus we 
can apply Theorem 13.81 and it follows that w' \= (p. 

For the inductive case, we partition w into segments, where each segment is a maximal 
sequence of the same letter. For example, the word aaabb has two segments, aaa and bb. 
First, we let w' be a copy of w where we cut down all segments that are longer than 2n to 
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exactly 2n letters. Since no n-ranker can point to a position within any segment after the 
first n letters and before the last n letters of that segment, we have w' \= (p. 

Now we partition the word w' such that w' = uiSiU2 ■ ■ . UrSrUr+i, where r E N and 
for every 1 < i < r, Ui is a string of maximal length that uses exactly k different letters, 
Si is a segment, and Ur+i is a string over at most a fc-letter alphabet. We observe that 
this partition is unique: If a is the last of the {k + 1) letters in our alphabet to appear in 
w', starting from the left, then si is the left-most segment of a's, and ui is everything up 
to that segment. Now S2 is the left-most segment after si of the letter that appears last 
after si, and so on. We can point to a position in segment Sn with an n-ranker, but no 
n-ranker that starts with > can point to a position to the right of s„. Similarly, we partition 
w' , now starting from the right, such that w' = Vq^itqVq . . .^2^1^!, where q G N and for 
every 1 < i < q, Vi is a string of maximal length that uses exactly k different letters, ti is 
a segment, and w^+i is a string over at most a fc-letter alphabet. Again, this partition is 
unique and any n-ranker that starts with < cannot point to a position to the left of tn- We 
also notice that both partitions have the same number of segments, i.e. r = q, since any 
substring UiSi from the first partition contains all letters of the alphabet and thus has to 
contain at least one segment tj from the second partition, and vice versa. 

If both partitions use more than 2n segments, then the segment Sn of the first partition 
occurs to the left of the segment tn of the second partition. In this case we construct the 
word w" = U1S1U2 ■ ■ ■ UnSntnVn ■ ■ ■ V2tiVi. w" agrees with w' on all n-rankers, and thus 
w" \= Every one of the strings lii, . . . , m„ and vi, . . .Vn uses at most k different letters, 
therefore we can apply the inductive hypothesis and replace each of these strings with an 
equivalent string of length 0{n^), as explained in Lemma l6. 11 Thus we have constructed a 
word of length 0{n''~^^) that satisfies ip. 

If the partitions have at most 2n segments, then we combine the two partitions such 
that w' = wixi . . . XpWp+i, where p < An, and for every 1 < i < p, Xp is one of the original 
segments si, . . . ,Sr and ti, . . . ,tq. As above, we use the inductive hypothesis to replace all 
strings Xi with equivalent strings of length 0{n^), and thus construct a new string of length 
0(n'^+^) that satisfies (p. □ 

Theorem 6.3. Satisfiability for F0^[<] where the size of the alphabet is bounded by some 
fixed k >2 is NP-complete. 

Proof. Membership in N P follows immediately from Theorem 16.21 - we nondeterministically 
guess a model of size 0{n^) where n is the quantifier depth of the given formula, and verify 
that it is a model of the formula. Now we give a reduction from SAT. Let a be a boolean 
formula in conjunctive normal form over the variables Xi, . . . ,Xn- We construct a F0^[<] 
formula Lp = pn l\ where says that every model has size exactly n, and where 

we replace every occurrence of Xi in a with a formula of length 0(n) which says that the 
z-th letter is a 1. The total length of ip is 0(|a| • n), and ip is satisfiable iff a is satisfiable. □ 

7. Conclusion 

We proved precise structure theorems for FO^, with and without the successor predicate, 
that completely characterize the expressive power of the respective logics, including exact 
bounds on the quantifier depth and on the alternation depth. Using our structure theorems, 
we showed that the quantifier alternation hierarchy for FO^ is strict, settling an open 
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question from [U |4] . Both our structure theorems and the alternation hierarchy results add 
further insight to and simplify previous characterizations of FO^. We hope that the insights 
gained in our study of FO^ on words will be useful in future investigations of the trade-off 
between formula size and number of variables. 
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